Discriminative learning from partially annotated examples
نویسنده
چکیده
A number of algorithms and its applications for automatic classifiers learning from examples is ever growing. Most of existing algorithms require a training set of completely annotated examples, which are often hard to obtain. In this thesis, we tackle the problem of learning from partially annotated examples, which means that each training input comes with a set of admissible labels only one of which is correct. We contributed to two different cases of this scenario. In the first case, we studied the problem of learning the ordinal classifiers from examples with interval annotation of labels. We designed a convex learning algorithm for this case and demonstrated its advantage on real data empirically. At the same time, we made several contributions to the supervised learning of the ordinal classifiers, namely, we proposed new parametrization of the ordinal classifier, we introduced more flexible piece wise version of the ordinal classifier, and we proposed a generic cutting plane solver with convergence guarantees. In the second case, we studied the problem of learning the structured output classifiers from examples with missing annotation of a subset of labels. We have defined the concept of a surrogate classification calibrated partial loss, the minimization of which guarantees that learning is statistical consistent under fairly general conditions on the data generating process. We proved the existence of a convex classification calibrated surrogate loss for learning from partially annotated examples. We showed which existing surrogate losses are classification calibrated and which are not. Our work thus provides a missing theoretical justification for so far heuristic methods which have been successfully used in practice.
منابع مشابه
Interval Insensitive Loss for Ordinal Classification
We address a problem of learning ordinal classifier from partially annotated examples. We introduce an interval-insensitive loss function to measure discrepancy between predictions of an ordinal classifier and a partial annotation provided in the form of intervals of admissible labels. The proposed interval-insensitive loss is an instance of loss functions previously used for learning of differ...
متن کاملWeakly supervised discriminative localization and classification: a joint learning process
Visual categorization problems, such as object classification or action recognition, are increasingly often approached using a detection strategy: a classifier function is first applied to candidate subwindows of the image or the video, and then the maximum classifier score is used for class decision. Traditionally, the subwindow classifiers are trained on a large collection of examples manuall...
متن کاملName Tagging with Word Clusters and Discriminative Training
We present a technique for augmenting annotated training data with hierarchical word clusters that are automatically derived from a large unannotated corpus. Cluster membership is encoded in features that are incorporated in a discriminatively trained tagging model. Active learning is used to select training examples. We evaluate the technique for named-entity tagging. Compared with a state-of-...
متن کاملLearning discriminative localization from weakly labeled data
Visual categorization problems, such as object classification or action recognition, are increasingly often approached using a detection strategy: a classifier function is first applied to candidate subwindows of the image or the video, and then the maximum classifier score is used for class decision. Traditionally, the subwindow classifiers are trained on a large collection of examples manuall...
متن کاملLearning Transferable Representation for Bilingual Relation Extraction via Convolutional Neural Networks
Typically, relation extraction models are trained to extract instances of a relation ontology using only training data from a single language. However, the concepts represented by the relation ontology (e.g. ResidesIn, EmployeeOf) are language independent. The numbers of annotated examples available for a given ontology vary between languages. For example, there are far fewer annotated examples...
متن کامل